流量数据长期遭受缺失和腐败的困扰,从而导致随后的智能运输系统(ITS)应用程序的准确性和效用降低。注意到流量数据的固有低级属性,大量研究将缺少的流量数据恢复为低级张量完成(LRTC)问题。由于LRTC中的秩最小化的非跨性别性和离散性,现有方法要么用凸面替代等级代替等级替代等级函数,要么以涉及许多参数的非convex替代物,或近似等级。在这项研究中,我们提出了一个用于交通数据恢复的无参数的非凸张量完成模型(TC-PFNC),其中设计了基于日志的松弛项以近似张量代数级别。此外,以前的研究通常认为观察结果是可靠的,没有任何异常值。因此,我们通过对潜在的流量数据异常值进行建模,将TC-PFNC扩展到了强大的版本(RTC-PFNC),该数据可以从部分和损坏的观测值中恢复缺失的值并在观测中删除异常。基于交替的方向乘数法(ADMM)详细阐述了TC-PFNC和RTC-PFNC的数值解。在四个现实世界流量数据集上进行的广泛实验结果表明,所提出的方法在缺失和损坏的数据恢复中都优于其他最先进的方法。本文使用的代码可在以下网址获得:https://github.com/younghe49/t-ITSPFNC。
translated by 谷歌翻译
利用标签相关性对于多标签分类很重要。先前的方法主要通过将标签矩阵转换为具有低升级矩阵分解的潜在标签空间来捕获高阶标签相关性。但是,标签矩阵通常是一个全等级或近似的全级矩阵,使得低级别的分解不合适。此外,在潜在空间中,标签相关性将成为隐式。为此,我们提出了一种简单而有效的方法,以明确描绘高阶标签相关性,同时保持标签矩阵的高级别。此外,我们通过输入的局部几何结构同时估计标签相关性和推断模型参数,以实现相互增强。超过十个基准数据集的比较研究验证了所提出的算法在多标签分类中的有效性。利用的高阶标签相关性与常识在经验上是一致的。我们的代码可在https://github.com/601175936/homi上公开获取。
translated by 谷歌翻译
现有的深度嵌入聚类工作仅考虑最深层的学习功能嵌入,因此未能利用来自群集分配的可用辨别信息,从而产生性能限制。为此,我们提出了一种新颖的方法,即深入关注引导的图形聚类与双自我监督(DAGC)。具体地,DAGC首先利用异质性 - 方向融合模块,以便于在每个层中自适应地集成自动编码器的特征和图形卷积网络,然后使用尺度明智的融合模块动态地连接不同层中的多尺度特征。这种模块能够通过基于注意的机制学习歧视特征。此外,我们设计了一种分配明智的融合模块,它利用群集分配直接获取聚类结果。为了更好地探索集群分配的歧视信息,我们开发了一种双重自我监督解决方案,包括软自我监督策略,具有三联kullback-Leibler发散损失和具有伪监督损失的硬自我监督策略。广泛的实验验证了我们的方法在六个基准数据集中始终如一地优于最先进的方法。特别是,我们的方法通过最佳基线超过18.14%的方法将ARI提高。
translated by 谷歌翻译
In this paper, we propose a robust 3D detector, named Cross Modal Transformer (CMT), for end-to-end 3D multi-modal detection. Without explicit view transformation, CMT takes the image and point clouds tokens as inputs and directly outputs accurate 3D bounding boxes. The spatial alignment of multi-modal tokens is performed implicitly, by encoding the 3D points into multi-modal features. The core design of CMT is quite simple while its performance is impressive. CMT obtains 73.0% NDS on nuScenes benchmark. Moreover, CMT has a strong robustness even if the LiDAR is missing. Code will be released at https://github.com/junjie18/CMT.
translated by 谷歌翻译
A recent study has shown a phenomenon called neural collapse in that the within-class means of features and the classifier weight vectors converge to the vertices of a simplex equiangular tight frame at the terminal phase of training for classification. In this paper, we explore the corresponding structures of the last-layer feature centers and classifiers in semantic segmentation. Based on our empirical and theoretical analysis, we point out that semantic segmentation naturally brings contextual correlation and imbalanced distribution among classes, which breaks the equiangular and maximally separated structure of neural collapse for both feature centers and classifiers. However, such a symmetric structure is beneficial to discrimination for the minor classes. To preserve these advantages, we introduce a regularizer on feature centers to encourage the network to learn features closer to the appealing structure in imbalanced semantic segmentation. Experimental results show that our method can bring significant improvements on both 2D and 3D semantic segmentation benchmarks. Moreover, our method ranks 1st and sets a new record (+6.8% mIoU) on the ScanNet200 test leaderboard. Code will be available at https://github.com/dvlab-research/Imbalanced-Learning.
translated by 谷歌翻译
Witnessing the impressive achievements of pre-training techniques on large-scale data in the field of computer vision and natural language processing, we wonder whether this idea could be adapted in a grab-and-go spirit, and mitigate the sample inefficiency problem for visuomotor driving. Given the highly dynamic and variant nature of the input, the visuomotor driving task inherently lacks view and translation invariance, and the visual input contains massive irrelevant information for decision making, resulting in predominant pre-training approaches from general vision less suitable for the autonomous driving task. To this end, we propose PPGeo (Policy Pre-training via Geometric modeling), an intuitive and straightforward fully self-supervised framework curated for the policy pretraining in visuomotor driving. We aim at learning policy representations as a powerful abstraction by modeling 3D geometric scenes on large-scale unlabeled and uncalibrated YouTube driving videos. The proposed PPGeo is performed in two stages to support effective self-supervised training. In the first stage, the geometric modeling framework generates pose and depth predictions simultaneously, with two consecutive frames as input. In the second stage, the visual encoder learns driving policy representation by predicting the future ego-motion and optimizing with the photometric error based on current visual observation only. As such, the pre-trained visual encoder is equipped with rich driving policy related representations and thereby competent for multiple visuomotor driving tasks. Extensive experiments covering a wide span of challenging scenarios have demonstrated the superiority of our proposed approach, where improvements range from 2% to even over 100% with very limited data. Code and models will be available at https://github.com/OpenDriveLab/PPGeo.
translated by 谷歌翻译
This paper illustrates the technologies of user next intent prediction with a concept knowledge graph. The system has been deployed on the Web at Alipay, serving more than 100 million daily active users. Specifically, we propose AlipayKG to explicitly characterize user intent, which is an offline concept knowledge graph in the Life-Service domain modeling the historical behaviors of users, the rich content interacted by users and the relations between them. We further introduce a Transformer-based model which integrates expert rules from the knowledge graph to infer the online user's next intent. Experimental results demonstrate that the proposed system can effectively enhance the performance of the downstream tasks while retaining explainability.
translated by 谷歌翻译
Human parsing aims to partition humans in image or video into multiple pixel-level semantic parts. In the last decade, it has gained significantly increased interest in the computer vision community and has been utilized in a broad range of practical applications, from security monitoring, to social media, to visual special effects, just to name a few. Although deep learning-based human parsing solutions have made remarkable achievements, many important concepts, existing challenges, and potential research directions are still confusing. In this survey, we comprehensively review three core sub-tasks: single human parsing, multiple human parsing, and video human parsing, by introducing their respective task settings, background concepts, relevant problems and applications, representative literature, and datasets. We also present quantitative performance comparisons of the reviewed methods on benchmark datasets. Additionally, to promote sustainable development of the community, we put forward a transformer-based human parsing framework, providing a high-performance baseline for follow-up research through universal, concise, and extensible solutions. Finally, we point out a set of under-investigated open issues in this field and suggest new directions for future study. We also provide a regularly updated project page, to continuously track recent developments in this fast-advancing field: https://github.com/soeaver/awesome-human-parsing.
translated by 谷歌翻译
Cashews are grown by over 3 million smallholders in more than 40 countries worldwide as a principal source of income. As the third largest cashew producer in Africa, Benin has nearly 200,000 smallholder cashew growers contributing 15% of the country's national export earnings. However, a lack of information on where and how cashew trees grow across the country hinders decision-making that could support increased cashew production and poverty alleviation. By leveraging 2.4-m Planet Basemaps and 0.5-m aerial imagery, newly developed deep learning algorithms, and large-scale ground truth datasets, we successfully produced the first national map of cashew in Benin and characterized the expansion of cashew plantations between 2015 and 2021. In particular, we developed a SpatioTemporal Classification with Attention (STCA) model to map the distribution of cashew plantations, which can fully capture texture information from discriminative time steps during a growing season. We further developed a Clustering Augmented Self-supervised Temporal Classification (CASTC) model to distinguish high-density versus low-density cashew plantations by automatic feature extraction and optimized clustering. Results show that the STCA model has an overall accuracy of 80% and the CASTC model achieved an overall accuracy of 77.9%. We found that the cashew area in Benin has doubled from 2015 to 2021 with 60% of new plantation development coming from cropland or fallow land, while encroachment of cashew plantations into protected areas has increased by 70%. Only half of cashew plantations were high-density in 2021, suggesting high potential for intensification. Our study illustrates the power of combining high-resolution remote sensing imagery and state-of-the-art deep learning algorithms to better understand tree crops in the heterogeneous smallholder landscape.
translated by 谷歌翻译
We present Second Thought, a new learning paradigm that enables language models (LMs) to re-align with human values. By modeling the chain-of-edits between value-unaligned and value-aligned text, with LM fine-tuning and additional refinement through reinforcement learning, Second Thought not only achieves superior performance in three value alignment benchmark datasets but also shows strong human-value transfer learning ability in few-shot scenarios. The generated editing steps also offer better interpretability and ease for interactive error correction. Extensive human evaluations further confirm its effectiveness.
translated by 谷歌翻译